material science
Publication Trend Analysis and Synthesis via Large Language Model: A Case Study of Engineering in PNAS
Smetana, Mason, Khazanovich, Lev
Scientific literature is increasingly siloed by complex language, static disciplinary structures, and potentially sparse keyword systems, making it cumbersome to capture the dynamic nature of modern science. This study addresses these challenges by introducing an adaptable large language model (LLM)-driven framework to quantify thematic trends and map the evolving landscape of scientific knowledge. The approach is demonstrated over a 20-year collection of more than 1,500 engineering articles published by the Proceedings of the National Academy of Sciences (PNAS), marked for their breadth and depth of research focus. A two-stage classification pipeline first establishes a primary thematic category for each article based on its abstract. The subsequent phase performs a full-text analysis to assign secondary classifications, revealing latent, cross-topic connections across the corpus. Traditional natural language processing (NLP) methods, such as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), confirm the resulting topical structure and also suggest that standalone word-frequency analyses may be insufficient for mapping fields with high diversity. Finally, a disjoint graph representation between the primary and secondary classifications reveals implicit connections between themes that may be less apparent when analyzing abstracts or keywords alone. The findings show that the approach independently recovers much of the journal's editorially embedded structure without prior knowledge of its existing dual-classification schema (e.g., biological studies also classified as engineering). This framework offers a powerful tool for detecting potential thematic trends and providing a high-level overview of scientific progress.
- North America > United States > New Jersey > Atlantic County > Atlantic City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Health Care Technology (0.69)
- Health & Medicine > Therapeutic Area (0.68)
- Energy (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
MatWheel: Addressing Data Scarcity in Materials Science Through Synthetic Data
Li, Wentao, Chen, Yizhe, Qiu, Jiangjie, Wang, Xiaonan
Data scarcity and the high cost of annotation have long been persistent challenges in the field of materials science. Inspired by its potential in other fields like computer vision, we propose the MatWheel framework, which train the material property prediction model using the synthetic data generated by the conditional generative model. We explore two scenarios: fully-supervised and semi-supervised learning. Using CGCNN for property prediction and Con-CDVAE as the conditional generative model, experiments on two data-scarce material property datasets from Matminer database are conducted. Results show that synthetic data has potential in extreme data-scarce scenarios, achieving performance close to or exceeding that of real samples in all two tasks. We also find that pseudo-labels have little impact on generated data quality. Future work will integrate advanced models and optimize generation conditions to boost the effectiveness of the materials data flywheel.
Foundational Large Language Models for Materials Research
Mishra, Vaibhav, Singh, Somaditya, Ahlawat, Dhruv, Zaki, Mohd, Bihani, Vaibhav, Grover, Hargun Singh, Mishra, Biswajit, Miret, Santiago, Mausam, null, Krishnan, N. M. Anoop
Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analysis and prediction. Still, their effective deployment requires domain-specific adaptation for understanding and solving domain-relevant tasks. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models on an extensive corpus of materials literature and crystallographic data. Through systematic evaluation, we demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities. The specialized LLaMat-CIF variant demonstrates unprecedented capabilities in crystal structure generation, predicting stable crystals with high coverage across the periodic table. Intriguingly, despite LLaMA-3's superior performance in comparison to LLaMA-2, we observe that LLaMat-2 demonstrates unexpectedly enhanced domain-specific performance across diverse materials science tasks, including structured information extraction from text and tables, more particularly in crystal structure generation, a potential adaptation rigidity in overtrained LLMs. Altogether, the present work demonstrates the effectiveness of domain adaptation towards developing practically deployable LLM copilots for materials research. Beyond materials science, our findings reveal important considerations for domain adaptation of LLMs, such as model selection, training methodology, and domain-specific performance, which may influence the development of specialized scientific AI systems.
- Asia (0.67)
- North America > Canada (0.28)
- Energy > Oil & Gas (0.67)
- Education > Curriculum > Subject-Specific Education (0.67)
- Materials > Chemicals (0.46)
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
Bhattarai, Manish, Barron, Ryan, Eren, Maksim, Vu, Minh, Grantcharov, Vesselin, Boureima, Ismael, Stanev, Valentin, Matuszek, Cynthia, Valtchinov, Vladimir, Rasmussen, Kim, Alexandrov, Boian
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model
Ye, Yanpeng, Ren, Jie, Wang, Shaozhou, Wan, Yuwei, Wang, Haofen, Razzak, Imran, Xie, Tong, Zhang, Wenjie
Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges for efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques, integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.
- Asia > China > Hong Kong (0.05)
- Oceania > Australia > New South Wales > Kensington (0.05)
- North America > United States (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Energy > Renewable (0.69)
- Energy > Energy Storage (0.68)
3 Key Areas Where Nanotechnology Is Impacting Our Future
We are living amid a technological revolution that is transforming the globe. Changes are visible in all aspects of our lives from transportation, health, and communications. As the adage states, yesterday's science fiction is today's science. We are now expanding our capabilities in every area of science, chemistry, biology, physics, and engineering. That includes heightened spae exploration, as well as building smart cities, new manufacturing hubs, and developing artificial intelligence and quantum technologies. The rapid pace of technological change is clearly visible, but much of what you may not see, the exceedingly small physical components of change called nanotechnologies, are catalyzing the revolution. While there are many nanotech uses, three areas of nanotech are paving the way to our future: Materials Science, Nanomedicine and Device Engineering.
- North America > United States > Pennsylvania (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > California (0.04)
- Europe > Netherlands > South Holland > The Hague (0.04)
- Information Technology > Security & Privacy (0.99)
- Health & Medicine > Health Care Technology (0.94)
- Government > Regional Government (0.94)
- Health & Medicine > Therapeutic Area (0.69)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Security & Privacy (0.99)
- Information Technology > Hardware (0.89)
Machine Learning Speeds up Simulations in Material Science
Research, development, and production of novel materials depend heavily on the availability of fast and at the same time accurate simulation methods. Machine learning, in which artificial intelligence (AI) autonomously acquires and applies new knowledge, will soon enable researchers to develop complex material systems in a purely virtual environment. How does this work, and which applications will benefit? In an article published in the Nature Materials journal, a researcher from Karlsruhe Institute of Technology (KIT) and his colleagues from Göttingen and Toronto explain it all. Digitization and virtualization are becoming increasingly important in a wide range of scientific disciplines.
- North America > Canada > Ontario > Toronto (0.41)
- Europe > Germany > Lower Saxony > Gottingen (0.29)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.28)
DataScience Digest -- 17.06.21
The never-ending fight with bias and AI systems that learn by watching YouTube. EU mobilizes to rein in tech giants. Facebook's AI has migrated all their AI systems to PyTorch. Within a year, there are more than 1,700 PyTorch-based inference models in full production at Facebook, and 93 percent of their new training models are on PyTorch. The times are hardly perfect for self-driving car companies.
- North America > Canada > Ontario > Toronto (0.05)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.05)
- Information Technology (0.91)
- Health & Medicine > Therapeutic Area > Oncology (0.49)
Machine learning speeds up simulations in material science
Research, development, and production of novel materials depend heavily on the availability of fast and at the same time accurate simulation methods. Machine learning, in which artificial intelligence (AI) autonomously acquires and applies new knowledge, will soon enable researchers to develop complex material systems in a purely virtual environment. How does this work, and which applications will benefit? In an article published in the Nature Materials journal, a researcher from Karlsruhe Institute of Technology (KIT) and his colleagues from Göttingen and Toronto explain it all. Digitization and virtualization are becoming increasingly important in a wide range of scientific disciplines.
- North America > Canada > Ontario > Toronto (0.41)
- Europe > Germany > Lower Saxony > Gottingen (0.29)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.28)
Leveraging General Adversarial Networks for Material Sciences
Material scientists often face the challenge of figuring out how to effectively search the vast chemical design space to locate the materials with their desired properties. To address this challenge, many scientists have turned to artificial intelligence in the race to discover new and advanced materials. A general adversarial network is a variety of machine learning framework that leverages the idea of "adversarial training" where a network is trained on adversarial examples. It is an idea that originates from game theory and introduced to the machine learning community in 2014 by Ian J. Goodfellow. With this in mind, a targeted strategy for developing novel chemical compositions is to develop sampling algorithms that can exploit both explicit chemical knowledge and implicit rules of composition embodied in a large database of materials.